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Retinol is one of the most biologically active forms of vitamin A and is hypothesized to influence a wide range 
of human diseases including asthma, cardiovascular disease, infectious diseases and cancer. We conducted 
a genome-wide association study of 5006 Caucasian individuals drawn from two cohorts of men: the Alpha- 
Tocopherol, Beta-Carotene Cancer Prevention (ATBC) Study and the Prostate, Lung, Colorectal, and Ovarian 
(PLCO) Cancer Screening Trial. We identified two independent single-nucleotide polymorphisms associated 
with circulating retinol levels, which are located near the transthyretin (77/7) and retinol binding protein 4 
(RBP4) genes which encode major carrier proteins of retinol: rs1667255 (P= 2.30 x 10~^^) and rs10882272 
(P= 6.04 X 10~^^). We replicated the association with rs10882272 in RBP4 in independent samples from 
the Nurses' Health Study and the Invecchiare in Chianti Study (InCHIANTI) that included 3792 women and 
504 men (P= 9.49 x 10"^), but found no association for retinol with rs1 667255 in 77/? among women, thus 
suggesting evidence for gender dimorphism (P-interaction = 1.31 x 10~^). Discovery of common genetic var- 
iants associated with serum retinol levels may provide further insight into the contribution of retinol and 
other vitamin A compounds to the development of cancer and other complex diseases. 



INTRODUCTION human diseases, including cardiovascular disease, infectious 

diseases and cancer at multiple sites (1). For example, in add- 
Retinol, one of the most biologically active forms of vitamin ition to the classic etiologic role of vitamin A deficiency in 
A, has been hypothesized to influence a wide range of xerophthalmia, a pathologic dryness of the eyes which can 
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lead to blindness (2), higher vitamin A status has been related 
to an increased risk of cardiovascular disease (3) and, most re- 
cently, to increased prostate cancer risk (4). Circulating con- 
centrations of retinol are influenced by many factors in 
addition to diet and supplements, including gut absorption, 
cleavage of pre-retinol carotenoid compounds, transport and 
hepatic storage/release. Identification of common variants 
that influence retinol levels may help shed light on the etiology 
of several diseases, including cancer. 

There is evidence that genetic variants influence circulating 
retinol. Family studies have estimated that 30% of the vari- 
ation in serum retinol is heritable (5). One case study demon- 
strated that a mutation in the gene encoding retinol-binding 
protein (RBP), one of the two major transport proteins for 
retinol in circulation, resulted in abnormally low retinol con- 
centrations (6), while inactivation of the transthyretin (TTR) 
gene, the other major retinol transport protein, also resulted 
in hypovitaminosis A in mice (7). One genome-wide associ- 
ation study (GWAS) study recently examined the genetics of 
serum retinol concentration in a population without conditions 
associated with retinol deficiency or TTR abnormalities, but 
failed to identify variants related to serum retinol at the 
genome-wide significance level (8). Thus, it remains unclear 
whether common single-nucleotide polymorphisms (SNPs) 
can explain variations in retinol concentration within the 
normal range. We therefore conducted a GWAS, similar to 
recent studies of other micronutrient traits (9,10), that 
pooled data from two cohort studies, and replicated our find- 
ings in two other independent cohorts. 



RESULTS 

The individual SNP-serum retinol association P-values from 
the initial GWAS are plotted by chromosome in Supplemen- 
tary Material, Figure SI. There were 10 SNPs on chromo- 
somes 10 and 18 that reached genome-wide significance 
{P < 5 X 10~^) for association with circulating retinol con- 
centration. We examined the association between all SNPs 
with initial-scan P-values below 10~^ (n= 121) in the replica- 
tion set; no additional SNPs reached genome-wide signifi- 
cance after meta-analysis with these data. The two highly 
significant SNPs from chromosome 10 were in the gene neigh- 
borhood of the RBP4 gene, which encodes retinol-binding 
protein 4 (RBP4), one of the two major carriers of retinol in 
serum. The strongest signal in this region was for 
rsl0882272 (P = 6.04 x 10"'^ Table 1 and Fig. 1). In the 
pooled analysis, the significance of an additional SNP that 
reached genome-wide significance in this region 
(rsl 1187545) was greatly attenuated when it was included in 
the conditional regression model with rsl0882272, showing 
no evidence for independence and suggesting that the signals 
from the two SNPs originate from a common locus (even 
though the two SNPs are not highly correlated; r^= 0.15) 
(Fig. 1). Also shown in the figure are the recombination 
hotspots which support the signal most likely being from 
one allele. The underlying linkage disequilibrium, as demon- 
strated with both Z)' = 0.83 and the haplotype structure, indi- 
cate that the variants are well correlated and differ slightly in 
minor allele frequency. 



rsl0882272 was independently significant in two of the 
replication sets (NHS-CGEMS, P = 0.003; and InCH-males, 
P = 0.025), as well as in the replication sets combined 
(P = 9.49 X 10~^), and it remained highly significant in the 
meta-analysis of the GWAS and replication data (P = 6.51 x 
10"'^) (Table 1). There was no heterogeneity across studies 
for this SNP (P = 0.67, Table 1). The estimated relative differ- 
ence in mean retinol levels per copy of the rsl0882272-C 
allele from the overall meta-analysis was a decrease of 3.0% 
(Table 1), and we estimated that this SNP accounted for 
0.5%) of the variation in serum retinol levels. 

We observed a cluster of eight SNPs on chromosome 18 
that were significantly associated with serum retinol in the 
initial GWAS (P < 5 x 10"**). These SNPs are near TTR, 
the gene encoding TTR, which dimerizes with RBP4 and is 
therefore also involved in retinol (as well as thyroid 
hormone) transport in circulation. The strongest signal in 
this region was for rsl 667255 (Table 1 and Fig. 2). When 
the other seven SNPs that reached genome-wide significance 
in the pooled analysis (i.e. rsl 667254, rsl6 16887, 
rsl667234, rs723744, rs4799585, rs9304102, rsl621308) 
were included in the regression model with rsl667255, their 
significance was greatly reduced, indicating signal from a 
common locus. rsl667255 did not reach statistical significance 
in the replication data set {P = 0.08, Table 1), but did exhibit 
highly significant heterogeneity in the strength of association 
across studies {P = 0.0005, Table 1), with similar magnitude 
of association in the ATBC and PLCO male cohorts, some- 
what attenuated beta for two of the replication studies 
(NHS-CHD and InCH-males), and lack of association in the 
other replication studies of women. The combined 
meta-analysis yielded significance at the ^=6.35 x 10"'"^ 
level (Table 1), and the random effects meta-analysis 
/■-value was 0.085. Combining ATBC and PLCO cohorts, 
the estimated beta and standard deviation are 0.039 and 
0.0046, while in the NHS and InCH-female set the estimated 
beta and standard deviation are 0.0065 and 0.0059. These 
yield a formal z-test P-value of 1.31 x I0~^ for the difference 
in the strength of SNP association between the male and 
female samples. Comparing those with two copies of the 
minor allele to those with zero copy, the difference in mean 
retinol ranged from —2.3 to 8.7% across the GWAS and rep- 
lication cohorts (Table 1), and we estimated in the pooled 
PLCO and ATBC studies that this SNP accounted for 1.7% 
of the variation in circulating retinol levels. These results 
were unchanged when we restricted the analysis to participants 
who did not use supplemental vitamin A of any kind (data not 
shown). 

We conducted an analysis in the pooled discovery GWAS 
data combining the two identified SNPs (rsl0882272 on 
chromosome 18, and rsl667255 on chromosome 10) and 
found that individuals with two copies of both variant alleles 
(i.e. rsl667255: C/C, and rsl0882272: C/C) had 12.7-15.1% 
higher serum retinol than those who were homozygous for 
the common allele for both SNPs (ATBC: 15.1%, 95% CI: 
14.3-18.4%; PLCO: 12.7%, 95% CL 11.6-13.7%). We esti- 
mated that the two SNPs together accounted for 2.3% of the 
variance in serum retinol levels. A sensitivity analysis con- 
ducted among the 2184 controls from the ATBC and PLCO 
nested sets revealed identical regression parameter estimates 
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"NCBI dbSNP identifier. 

"T^CBI Human Genome Build 36 location. 

"Result of the 1 df test based on linear regression using an additive model and adjusting for case status, age at blood collection (continuous), BMI (continuous), serum cholesterol (continuous), study and 
eigenvectors chosen to adjust for population stratification. Meta-analysis /"-values were based on the combined Wald statistics weighted by the square-root of the associated sample size. 
''SE, standard error. The regression (3 and its standard eiTor are calculated using natural-log transfomied retinol. The meta-analysis (3 and standard en'or are calculated using the fixed-effect model. 
"Assessed using the Cochrane's Q statistic. 
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Figure 1. LD structure of chromosome 10. P- values generated from ATBC and PLCO data. LR, the recombination rate on a logarithmic scale with 12 being 
'notable' for a hotspot. 
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Figure 2. LD stmcture of chromosome 18. P-values generated from ATBC and PLCO data. LR, the recombination rate on a logarithmic scale with 12 being 
'notable' for a hotspot. 



to those obtained using all participants (per minor allele 
change in log retinol levels = —0.03 and 0.04 for 
rsl0882272 and rsl 667255, respectively). 

The results above were unchanged when participants using 
supplemental vitamin A from either individual supplements or 
multivitamins (i.e. 10% in ATBC, 37% in PLCO and 41% in 
NHS) were excluded from the analysis (data not shown). We 
also tested the gene -serum retinol associations within strata 
of vitamin A supplement use: the betas, SEs and P-values 
for RBP4 rsl0882272 among vitamin A supplement users 
and non-users in our GWAS sample were —0.030/0.012/ 
P = 0.008 and -0.033/0.005/P = 1.41 x 10"^°, respectively. 
Similarly for TTR rsl667255, they were 0.040/0.012/ 
P = 0.001 and 0.041/0.005/P = 3.38 x 10"'^ respectively. 
The gene -serum retinol associations were also identical for 
RBP4 for higher and lower baseline serum retinol strata 
(-0.010/0.005//' = 0.025 and -0.013/0.004/P = 0.002, 
respectively), but showed signal only in the above-the-median 
group for TTR (0.019/0.004/P = 1.31 x 10"^) and not those 
with lower serum retinol levels (0.006/0.005//* = 0.24). 



DISCUSSION 

In this GWAS, we identified two distinct regions that influence 
circulating retinol levels. The SNPs showing the strongest signal 
localize to regions that include the biologically plausible candi- 
date genes, RBP4 and TTR, which encode the two major carrier 
proteins involved in retinol transport in circulation. Previous 
studies found mutations in the RBP4 and TTR genes that resulted 
in abnormally low retinol levels (6,7); however, no variants have 
been associated with altered retinol within the normal range, in- 
cluding from a previous GWAS that found no variants asso- 
ciated at the genome-wide significance level (8). Although our 
analyses suggest that the multiple SNPs we found in association 
with serum retinol in each gene neighborhood were from 
common loci, there may be additional functional loci in each 
of these genes that remain to be found through further study, in- 
cluding fine mapping. 

Our results differed from those of the one previous GWAS of 
serum retinol concentration, which identified no variants asso- 
ciated with serum retinol at the genome-wide significance level 
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Table 2. Descriptive characteristics of tlie participating GWAS cohorts 



Cohort 

ATBC PLCO NHS InCH 



No. of cases/no. of controls 2336/1678 

Location Finland 

Median age (years) 58 (54-62) 

Sex (% male) 100 

Smoking (% current) 100 

Dietaiy vitamin A intake (lU/day) 4203 (27 1 1 - 6602) 

Supplemental vitamin A use (% yes) 10 

Years of blood collection 1985-1988 

Blood sample Serum 

Median (interquailile range) of serum retinol (/J.g/1) 572 (496-654) 



(8). One possible explanation for this difference is that the previ- 
ous study was likely underpowered to detect associations of the 
magnitude that we observe here; i.e. with an initial GWAS 
sample size of n = 1242, they had ~13% power to detect a 
SNP explaining 1.5% of the variation in serum retinol levels. 

It should also be noted that for the SNP in the TTR gene 
neighborhood associated with serum retinol (rsl667255), 
there was highly significant heterogeneity between the discov- 
ery and replication cohorts. It seems unlikely that systematic 
differences across the studies, for example in genotyping or 
laboratory analyses, could account for this difference. It is 
notable that for the other SNP associated with circulating 
retinol, rsl 0882272, there was no evidence of heterogeneity 
and the findings were consistent across the studies. Another 
possible explanation has to do with the discovery cohorts con- 
sisting only of men, whereas women made up 88% of the rep- 
lication samples, with the one male replication set showing a 
borderline non-significant association {P = 0.06). The gender 
difference between the discovery and replication sets may 
also explain why rsl667255 failed to reach statistical signifi- 
cance in the NHS replication samples, despite having adequate 
power to detect an association (i.e. 99% power to detect a SNP 
explaining 1.7%) of the variation in serum retinol atP= 0.05). 
This heterogeneity raises the possibility of a biologically based 
sex difference in the transport of retinol (and possibly thyroid 
hormones) similar to gender dimorphisms observed for some 
other traits, including childhood obesity (11) and adult 
waist-to-hip ratio (12). The suggested gender dimorphism in 
the association between SNPs in the TTR gene neighborhood 
and serum retinol concentration should be examined in add- 
itional studies. 

Our investigation has many strengths, including large sample 
size, an independent population for replication, and measure- 
ment of serum retinol using the accepted, gold standard assay 
method across studies. The present analysis included only Cau- 
casians, however, and re-examination in populations of Asian 
and African descent is warranted. Because retinol deficiency 
is not common in developed populations, too few individuals 
had evidence of retinol deficiency (i.e. « = 25 at <300 fJig/l); 
therefore, this specific outcome could not be evaluated. 



Conclusions 

In this GWAS, variants near the genes encoding two major 
proteins involved in the transport of retinol and other 
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vitamin A compounds in circulation were associated with 
retinol concentration, and for one (TTR), gender dimorphism 
was suggested. Given the known associations between circu- 
lating retinol and human disease, understanding the underlying 
mechanisms that determine retinol concentrations and status 
may provide insight into how retinol transport and metabolism 
are related to disease risk, including cancer. Future studies 
should examine these findings in other populations. 

MATERIALS AND METHODS 

Sample description: discovery samples 

We conducted a GWAS analysis based on two cohorts with 
prospectively collected serum retinol levels: (i) the Alpha- 
Tocopherol, Beta-Carotene Cancer Prevention (ATBC) 
Study, a randomized trial of a-tocopherol and p-carotene for 
cancer prevention that was conducted among male smokers 
in southwestern Finland, (13) and (ii) the Prostate, Lung, Colo- 
rectal, and Ovarian (PLCO) Cancer Screening Trial (14), a 
multi-center cancer screening effectiveness trial conducted in 
the USA which included both smokers and non-smokers. 
Details of the participants from each cohort are presented in 
Table 2. In the ATBC Study, participants were previously 
selected for nested case-control sets to conduct GWAS ana- 
lyses to identify genetic determinants of lung, pancreatic, 
bladder and advanced prostate cancers. The present analyses 
are conducted in these participants (n = 4014) in whom 
genome-wide scans and information on serum retinol exists. 
In PLCO, participants were men who were previously selected 
for a nested case-control set to conduct GWAS analyses to 
identify genetic determinants of prostate cancer; 992 partici- 
pants with existing genetic and serum retinol data were avail- 
able for the present study. 

Sample description: replication samples 

We attempted to replicate in two independent cohorts the most 
significant findings (all SNPs P<10"^«=121) from the 
initial GWAS: 2772 women from the Nurses' Health Study 
(NHS) who were previously genotyped as part of three 
nested case-control GWAS studies of coronary heart 
disease (CHD), type 2 diabetes (T2D) and breast cancer 
[Cancer Genetic Markers of Susceptibility (CGEMS)] 
(15,16); and, 1124 men and women from the InCHIANTI 
Study (InCH) (8). 
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Genotyping and serum retinol measurement 

All genotyping for the initial GWAS was conducted at the 
National Cancer Institute Core Genotyping Facility (CGF). 
The discovery samples (ATBC Study and PLCO) were geno- 
typed using the Illumina HumanHap550/610 aiTays. The rep- 
lication samples from NHS were genotyped on either the 
Illumina HumanHap550 (breast cancer) or Affymetrix 6.0 
(CHD and T2D) platfonns; this genotyping was performed 
at the CGF (breast cancer), Merck laboratories (CHD) and 
the Broad Institute (T2D). Replication study samples from 
InCH were genotyped on the Illumina HumanHap550 at the 
Laboratory of Neurogenetics of the National Institute on 
Aging. Imputation was performed for SNPs not available 
from genotyping using the hidden-Markov model algorithm 
implemented in MACH using the HapMap CEU reference 
panel build 36, R22. The majority of the imputed SNPs had 
a high imputation quality score (MACH-R2> 0.6). Details of 
the quality-control assessment of genotypes including the 
sample completion rates, SNP call rates, concordance rates, 
fitness for deviation from Hardy -Weinberg proportions and 
final sample selection for analyses are described elsewhere 
(15-19). 

In the ATBC Study, retinol concentration was determined in 
baseline fasting serum samples using reversed-phase liquid 
chromatography with diode-array UV detection (20). After 
ethanol/ether extraction and injection into a Hypersil ODS 
column with isocratic methanol mobile phase and flow rate 
of 0.9 ml/min for 9 min run, retinol was monitored at 
305 nm wavelength. All samples were protected from light 
and stored at — 70°C until they were assayed. The coefficient 
of variation (CV) for the retinol measurement was 2.2%. 



PLCO serum samples collected at study entry were processed 
and stored within 2 h of collection and were also stored at 
— 70°C and measured using reversed-phase liquid chromatog- 
raphy with diode-array UV detection. After ethanol/hexane 
extraction, injection into an Adsorbosphere HS C column 
with a step gradient to acetonitrile-methanol-dichloro- 
methane mobile phase, and flow rate of 0.9 ml/min for 
23 min, retinol was monitored at 325 nm wavelength (21). Al- 
though the PLCO participants were not required to fast prior to 
blood collection, fasting and non-fasting serum retinol values 
are not significantly different (22). The CV for serum retinol in 
PLCO was 5.1%. NHS non-fasting plasma samples were 
stored at — 130°C and retinol was determined by reversed- 
phase high-performance liquid chromatography with UV 
detection as described by El-Sohemy et al. (23) with modifica- 
tions. Samples were processed by organic phase extraction in 
hexane and retinol was isolated from the extract using a C18 
column (150 mm X 4.6 mm, 3 |JLm particle size) with a 
mobile phase mixture of acetonitrile, tetrahydrofuran, metha- 
nol and 1% ammonium acetate solution (68:22:7:3) flowing 
at 1 ml/min. Concentration was determined at 300 nm wave- 
length (23). Retinol was run in 14 batches; CVs were <11% 
for all except 3 batches which had CVs ranging from 16 to 
24% (A'^ = 663 samples). Plasma and serum retinol concentra- 
tions have been shown to be quite comparable (24). Blood 
samples in InCH were collected in the morning after a 12 h 
overnight fast and aliquots of plasma were stored at — 80°C. 
Retinol was measured using an isocratic high-performance 
liquid chromatography method as described by Sowell et al. 
(25). Retinol was isolated using a 150 x 4.6 mm octadecylsi- 
lane column packed with 5 |xm particles with a mobile phase 
of an equivolume solution of ethanol and acetonitrile 
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containing 0.1 ml of diethylamine per liter of solvent. The 
flow rate was 0.9 ml/min and the concentration was deter- 
mined at 325 nm wavelength. The within-run and between-run 
coefficients of variation were 3.3 and 2.8%. 



Statistical analysis 

For the initial GWAS in the ATBC and PLCO, we performed 
linear regression assuming an additive genetic model of the 
genotype data and adjusting for cancer case status, age at 
blood collection (continuous), body mass index (BMI) (con- 
tinuous), serum cholesterol (continuous), study and eigenvec- 
tors chosen to adjust for population stratification. When 
combining SNPs from two genes for analysis, we created a 
score indicating the total number of alleles across SNPs con- 
ferring higher retinol and performed linear regression assum- 
ing an additive genetic model over the number of alleles 
adjusting for the same factors listed above. Retinol concentra- 
tions were transformed by taking the natural log, which was 
identified as the optimal transformation using the Box-Cox 
procedure. The Wald test was used to test the association 
between each SNP (n = 562 105) and serum retinol. All ana- 
lyses were conducted using the R software (version 2.10.1). 
Results from the quantile-quantile plot of P- values from the 
pooled analysis of the GWAS cohorts indicated no systematic 
type-I error inflation (X.gc= 1-018, Fig. 3). 

The five replication studies were analyzed separately using 
linear regression adjusted for case-control status, age at blood 
collection, BMI and serum cholesterol. Serum retinol was 
transformed by taking the natural log and then adjusted for 
batch. Results across the replication studies were synthesized 
using a fixed-effects model; heterogeneity in SNP-retinol asso- 
ciations was assessed using Cochran's Q statistic. 

For the meta-analysis combining the discovery and replica- 
tion studies, we combined the study-specific beta estimates for 
the discovery (pooled ATBC and PLCO samples) and the five 
replication studies (T2D, CHD, CGEMS, InCH-male and 
InCH-female) using a fixed-effects model; heterogeneity in 
SNP-retinol associations was assessed using Cochran's Q 
statistic. 
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